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Abstract 

The Sliding Window Secretary Problem allows a window of choices to the Classical Secretary 
Problem, in which there is the option to choose the previous K choices immediately prior to the 
current choice. We consider a case of this sequential choice problem in which the interviewer 
has a finite, known number of choices and can only discern the relative ranks of choices, and in 
which every permutation of ranks is equally likely. We examine three cases of the problem: (i) 
the interviewer has one choice to choose the best applicant; (ii) the interviewer has one choice to 
choose one of the top two applicants; and (iii) the interviewer has two choices to choose the best 
applicant. The form of the optimal strategy is shown, the probability of winning as a function 
of the window size is derived, and the limiting behavior is discussed for all three cases. 
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1 Introduction 


The classical secretary problem is a well-known decision theory problem, and the solution to the 
problem was first proven by Lindley (1961) and Dynkin (1963). Ferguson presents the problem as 
follows [6]: an interviewer sees a sequence of N applicants one at a time, and must decide whether to 
accept or to reject an applicant immediately after seeing the applicant. The interviewer’s decision 
is solely based on the relative ranks of previous applicants. No rejected applicant can be recalled, 
and the interviewer must make exactly one choice. Success occurs if the top applicant is chosen. 
For large N, the optimal strategy for the problem is the following threshold rule: reject a threshold 
number of applicants ~ y, and choose the next best applicant to appear. The interviewer wins 
with a probability that is approximately ^ ~ 0.37 with this strategy. 

The classical secretary problem has many applications. For example, the classical secretary 
problem has been applied to the behavior of a person searching for the best gas station or best 
restaurant after agreeing to look through a fixed number of stores. In fact, Seale and Rapoport 
|12j found that when presented with a scenario equivalent to the secretary problem, a majority of 
the fifty people in the study used a threshold rule, with the deviation of their threshold from the 
optimal threshold accounted for by an additional cost for the time spent before making a decision. 
The classical secretary problem can also be applied to data stream mining, in which a sampler 
collects and analyzes data real-time from sensors, computer programs, or web traffic. For example, 
Girdhar and Dudek [H] used a version of the secretary problem to model the optimal strategy for 
a robot probing a landscape to find the best location to place a sensor by taking a large number 
of pictures and assigning a score to each picture based on the variety of colors. In addition, Das 
|1] experimentally tested an algorithm that used the optimal strategy from a secretary problem to 
collect plankton that best represented a species responsible for toxic algal blooms. 

However, the classical secretary problem does not perfectly apply to the above situations. Re¬ 
alistically, the interviewer would have more time to decide on an applicant. Similarly, a person 
deciding while driving whether to stop at a particular gas station or restaurant would have some 
ability to backtrack and choose a previous store. As a result, Seale and Rapoport’s findings [l2] 
could be extended to more realistic scenarios if the decision-maker was given more time to make 
a decision. In addition, providing the decision-maker with more time would be beneficial for data 
stream mining. Ajtai, Megiddo, and Waarts [T] note that the classical secretary problem could be 
applied to choosing records of highest interest from a large data set or choosing images from a large 
digital library, but also that allowing for limited backtracking would make the application more 
realistic. We thus consider a secretary problem proposed in 2009 by Beccheti and Koutsoupias [2] 
in which the interviewer can keep the last K applicants as possible choices and hence has a sliding 
window of size K. 

In this paper, we study two cases of the Sliding-Window Secretary Problem with a fixed window 
size of K: a payoff of 1 for choosing the best applicant and 0 otherwise, the Best-1 case] and a payoff 
of 1 for choosing one of the top 2 applicants and 0 otherwise, the Best-2 case. We additionally 
study the 2-Choice case, in which the interviewer can choose two applicants and wins only if either 
of them are the best applicant overall. We discuss previous variations of the secretary problem in 
Section Then, for each case of the Sliding-Window Secretary Problem, we outline the effect of 
changes in the window size on the probability of winning, analyze special cases for the window size, 
provide a recursive solution that computes the probability of winning, and finally analyze limiting 
cases of the recursive solutions. We discuss the Best-1 case in Section the Best-2 case in Section 
1^ the 2-Choice case in Section and concluding remarks and future directions in Section 
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2 Background and History 


The classical secretary problem’s solution was first proven by Lindley in 1961 [TO] and Dynkin in 
1963 |5|. Their results are discussed in Section Many variations of the secretary problem have 
been studied in the past 60 years. We highlight a few variations, but not all. 

Finding the Best Applieant with Multiple Choices: Gilbert and Mosteller |7] offer a variation of the 
secretary problem in which an interviewer can choose r people from a pool of N applicants and 
wins if one of the r people is the best applicant. For the case r = 2, they show that for large N 
there are two optimal thresholds for each choice, ^ and and an optimal probability of winning 
of e~^ + They extend their analysis to general r and find the asymptotic behavior of the 

problem. Their results are numerically derived and cannot be explained analytically. 

Choosing the Best or the Second Best Applicant: Gilbert and Mosteller also analyze a secretary 
problem in which the interviewer wins if the best or second best applicant is chosen. There are 
two threshold values, d\ and iii the optimal strategy. The interviewer passes d\ applicants, then 
chooses the next applicant better than all previous applicants. If an applicant has not been chosen 
by index d 2 , the interviewer now chooses the best or second best applicant out of all previous 
applicants. They find d\ k, 0.347A^ and ^ and that the optimal probability of success is 
approximately 0.574 for N large. This problem is discussed in more detail in the context of the 
Sliding Window Secretary Problem in Section 

The Ability to Recall a Candidate with a Fixed Probability: Another variation allows the interviewer 
to recall a previous applicant with a fixed probability, as seen in Petrucelli m- The applicant cur¬ 
rently being interviewed can accept the job with a probability of q < 1, and if the interviewer 
decides to choose some previous applicant, the probability of the previous applicant accepting the 
job is p < q. While the probability of winning increases as p increases, the probability of winning 
with a nonzero value of p approaches the probability of winning with p = 0 for N large. 

The Best Expected Rank: The payoff for the secretary problem is now the value of the rank of the 
applicant, and the interviewer seeks to minimize the expected rank. Chow, Moriguti, Robbins, and 
Samuels |3| show that as N approaches infinity, the best expected rank approaches 3.87. 

Maximizing the Expected Rank with the Ability to Choose More than One Applicant: Ajtai, Megiddo, 
and Waarts [T] extend the work of Chow et al. by looking at the best expected rank, given r choices. 
They devised algorithms for this process and found that the best expected sum of the power 
of the ranks of the r choices is between -h 0{k^) and -|- C{z)r^~^^'^ logr, where C{z) is a 
value that depends on z. 

Recalling Previous Candidates: Using the same payoff as Chow et al, Goldys considered the prob¬ 
lem in which an interviewer tries to achieve the best expected rank with a sliding window of size 
2. He showed that as N approaches oo, the best expected rank approaches approximately 2.57 |H|. 

3 The Sliding-Window Problem: The Best-1 Case 

In this section we study the Secretary Problem with a Sliding Window of choices. The interviewer 
knows the number of applicants N and can choose any of the last K applicants, for some fixed 
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K. Let the index of an applicant be its position in a sequence of applicants. Then, we define the 
window to be the set of K consecutive applicants that the interviewer can choose from, such that 
the smallest index in the window contains the applicant who must be rejected or accepted before a 
new applicant can be interviewed. Each applicant has a distinct rank and is seen sequentially in a 
randomized order. Let R{m) be a bijective function from [l,iV] to [l,iV] that returns the absolute 
rank of the applicant at index m, with 1 representing the best rank. However, the interviewer 
can only rank the applicants seen so far and thus can only discern relative ranks. We seek the 
optimal strategy for finding the best applicant, in which the payoff is 1 for choosing the best and 
0 otherwise. 

When iL = 1, the problem is identical to the Classical Secretary Problem. While Lindley (1961) 
and Dynkin (1963) have proven the secretary problem earlier, we refer to the 1966 paper of Gilbert 
and Hosteller [7]. Gilbert and Hosteller derived the optimal strategy and the optimal thresholds. 
For large it is optimal to pass over approximately ^ applicants and then choose the next best 
applicant. This gives Pr(Win) « We present their proofs in Appendix A for intuition for later 
proofs. 

Let a candidate be an applicant that provides a strictly nonzero probability of winning from the 
perspective of the interviewer if chosen. Specifically in the Best-1 case, a candidate is located in 
the current window and has the best rank out of all seen applicants. Because rejecting applicants 
who are not candidates does not reduce the probability of winning, we adopt a sliding rule in which 
applicants are interviewed to advance the window until a candidate is at the smallest index of the 
window. We now extend the optimal strategy of the Classical Secretary Problem and show that 
in order to maximize the probability of winning the interviewer must reject a particular number of 
applicants and then accept the hrst candidate to appear, due to the following concept from Gilbert 
and Hosteller (1966) [7j: we choose candidate i in our window if and only if 


Pr(Win I Choosing Candidate i) > Pr(Win I Rejecting Candidate i), 


( 1 ) 


because the interviewer only chooses an applicant that provides a higher probability of winning if 
chosen than if rejected. 


Theorem 3.1. The optimal strategy for the Best-1 case of the Sliding-Window Secretary Problem 
is to reject the first d* applicants for some integer d* > 0, and then to choose the next candidate 
with the sliding rule. 

Proof. Let S = {i G [l,iV] | Inequality 0 holds}. Because the interviewer has seen i -\- K — 1 
applicants when the window starts at i, Pr(Win | Choosing Candidate i) = . Because the 

probability that the best applicant lies between i + K and N decreases as i increases, Pr(Win | 
Rejecting Candidate i) decreases in i. If applicant N — K is a candidate, because the last applicant 
is the best with probability the probability of winning and rejecting candidate — AT is A 
sketch of Pr(Win | Choosing Candidate i) and Pr(Win | Rejecting Candidate i) is shown in Figure 
[^to provide intuition for the remaining part of the proof. Thus Inequality Q holds for i = N — K. 
Because Pr(Win | Choosing Candidate i) strictly increases and Pr(Win | Rejecting Candidate i) 
decreases in i, all elements in S are consecutive integers. Because S is nonempty, there is a least 
element in S, which we call d* + 1. Thus there is a d* such that the first d* applicants should be 
rejected, and the first candidate after d* should be accepted. □ 


By the definition of d* in Theorem 3.1, the optimal strategy for the Best-1 case is to reject the 
first d* applicants and use the sliding rule to accept the next candidate. Note that even though 
the first d* applicants are skipped, their relative ranks are still used to determine if an applicant is 
a candidate. 
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Figure 1: A Pictorial Representation of Theorem 3.1 The probability of winning with candidate i 


is strictly increasing, while the probability of winning after rejecting candidate i is decreasing. The 
threshold value occurs where the two lines meet. 


3.1 Special Cases for K 

We first characterize the probability of winning for K = 2 and A = — 1 in this section. 

For a large window size of A = N — 1, the only possible threshold values are 0 or 1. Suppose 
d = 1. Failure occurs only when the best applicant is skipped, i.e., ii(l) = 1. This event occurs 
with probability Now suppose d = 0. Failure only occurs if the second best is at index 1 and 
the best is at index N, i.e., i?(l) = 2 and R{N) = 1. This event occurs with probability 
Thus, d* = 0. 

Now consider a small window size of A = 2. Let j be the index of the best candidate, and d be 
an arbitrary threshold. If j < d, the interviewer loses, and if j = d+1 or j = d + 2, the interviewer 
wins. j > d + 2, the interviewer wins if there are no candidates before j. If there is an applicant 
i > d better than all previous applicants, then applicant i + 1 must be better than applicant i so 
that applicant i is not a candidate. Thus the sequence of applicants between i and j — I must form 
a sequence of strictly improving ranks. The probability that R{x) is better than the rank of all 
preceding applicants is and the probability that i is the first applicant better than the first d 
applicants is Thus, if i is the first applicant better than the first d applicants, the probability 
that there are no candidates before j is ^ 0^=1 x ~ each value of j occurs with 

probability and j = d + 1 and j = d + 2 guarantee wins, when we sum the probabilities for all 
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values of i and j, we find 


N 


p rw I 2 d ^ Z]i=d+i(* 2)- 

Pr(Win \d) = - + - 


j=d+3 


U - 1 )! 


( 2 ) 


Values of the summation in Equation Q for iV = 100 and various values of d are in Appendix 


o 

Analyzing the resnlts of a simulation for small values of N and K in Appendix B suggests that 
as K increases for fixed N, Pr(Win) increases and d* decreases. Therefore, we first prove that 
Pr(Win) strictly increases as K increases for fixed N. 


Lemma 3.2. Let d*^^ and d* be the optimal thresholds for windows K and k respectively. If k < K, 
then Pr(Win | k, d*) < Pr(Win | K,d*j^). 

Proof. We show that Pr(Win | k, d*) < Pr(Win | K, d*) < Pr(Win | K, d^). Because d|^ is optimal 
for a window size of K, Pr(Win | K, d*) < Pr(Win | K, d^). We now prove that Pr(Win I < 

Pr(Win I K,d%). We define j to be the index of the best applicant. 

A window of K provides the interviewer with at least the same winning sequences as a window 
of K, for the same d*, becanse the interviewer can ignore the last K — k applicants in the window. 
In addition, there exists a sequence in which a candidate appears before j — k + 1, but after j — K. 
Therefore, with this sequence, the interviewer loses with a window of k but wins with a window of 
K. Thns Pr(Win | k, dl) < Pr(Win | AT, d*). □ 


Now we prove that d^ decreases as K increases for fixed N. 


Lemma 3.3. Let d*j^ and d* he the optimal passing thresholds for windows K and n, respectively, 
and j be the index of the best applicant. If k < K, then df^ < d*. 


Proof. From the proof of Theorem 3.1, if applicant i is a candidate, Pr(Win | Choosing i, K) > 
Pr(Win I Choosing i, k). Because j G [(i + k), V] occurs with higher probability than j G [(i + 
K),N], Pr(Win | Rejecting i, K) < Pr(Win | Rejecting i, k). By Theorem 3.1 the smallest integer 
i such that Inequality Q holds is (d* +1). It follows from the previous inequalities that Inequality 
0 holds for a window size of K at index (d* + 1). Therefore, because d^ + 1 is the least index for 
which Inequality Q holds if applicant d*j^ + 1 is a candidate, d*j^ <dl. □ 


Finally we present the exact and asymptotic solntions to the secretary problem for a window 
size oi K > ^. 


Theorem 3.4. Let d* he the optimal threshold number of applicants to reject. 

(i) If K > Y; then d* = 0. 

(a) If K = ^, then d* = 0 or d* = 1. 

(in) For N»1 and K > f, Pr(Win) 2 - f + In f. 

Proof, (i) and (ii): Applicant 1 may or may not be a candidate. First let Applicant 1 be a candidate. 
If applicant 1 is chosen, Pr(Win) = ^. If applicant 1 is rejected, no candidates are between 2 and 
K, and so the window slides past K, and the remaining N — K applicants are seen. The best 
applicant among the N — K applicants is the best overall with probability ■ Thus for AT > y, 
applicant 1 shonld be accepted, and for dC = y, accepting or rejecting applicant 1 provides equal 
probabilities of winning, y Now let applicant 1 not be a candidate. By the sliding rnle, the window 
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starts at some index z > 1 where z is a candidate. The problem reduces to a new problem with 
— z + 1 applicants, window size of K, and a candidate at index 1. Because K > , the first 

candidate should be selected. Therefore, d* = 0 for K > ^, and d* is either 0 or 1 for K = ^. 

(iii): The best applicant at index j is guaranteed to be chosen if j G [1,^1], which occurs with 
probability If index j G [IT + 1, A^], then j will be chosen if no candidates are before j — K -\-l. 
For m < j — K + 1 the probability that m is a candidate is . By the sliding rule, there 

cannot be more than one candidate in [1, K\. Therefore, we sum over all m G [1, j — i^ + 1], 

and take the complement of the sum to find the probability of not stopping before j. Each value 
of j occurs with probability By the Total Probability Theorem, we add up the probability of 
winning for all possible values of j and find 


Pr 




K 1 
JV ^ iV 


E 


j=K+l 


j-K 

-E 


m=l 


1 


m + K — 1 


( 3 ) 


For large N we approximate the sums in Equation ^ as integrals. The worst approximation of 
the inner sum occurs when the integral approximates only one term in the summation: ^. Because 
the function in the integral has initial value final value and is strictly decreasing, the 
integral approximates the sum with an error on the order of Therefore, because N and K are 
large, the integral approximation is acceptable, and is similarly acceptable for the outer sum. If we 
let X = f, y = ^ and z = 


Pr(Win I K > 




dy 

x + y 


dz 



+ In —. 
N 


□ 


3.2 A Recursive Formula for the Probability of Winning 

We now analyze the problem for some window size K and some threshold value d of automatically 
rejected applicants. We divide the sequence of applicants after d into blocks of K because the sliding 
rule guarantees that no block of K has more than one candidate. Let fg{a) be the probability of 
stopping between {d + {q — 1)K + 1) and a, where q = Because no applicant before d is 

chosen, fq{a) = 0 for y < 1 and a < d. We present a recursive formula for fg. 

Lemma 3.5. For q > 0, 


fq{a) 


1 


a 

m + K — 1 

m=d+(q—l)K+l 


^ fr{d + rK) - fg-i{m - K) 


( 4 ) 


Proof. By the sliding rule, the window stops sliding when a candidate is at the smallest index of 
the window. The probability that an applicant at some index m is a candidate is ■ However, 

m will not be reached if a candidate is between (d +1) and (m — K), so we subtract the probability 
that a candidate appears in the previous q — 2 blocks or between indices {d + {q — 2)K + I) and 
m — K. Summing these probabilities for all values of m in [{d + (y — 1)K + I), a] yields 


/-?(«) = ^ 


1 


m=d+{q—l)K+l 


m + K — 1 


I - 


q-2 

E 

r =—1 


fr{d + rK) - fq-i{m - K) 


□ 
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We now find Pr(Win) for a particular N, K, and threshold index d. Let crq{a) be the probability 
of winning with a candidate in [1, a], where q = \ • Because no applicants before d are chosen, 
iTq(a) = 0 for g < 1 and a < d. We present a recursive formula for dg. 

Theorem 3.6. For q > 0, 


q-2 


aq{a) = aq-i{d + {q - 1)K) + 


N 


Y, 1- E fr{d + rK)-fq_,{j-K) 


( 5 ) 


j=d+{q-l)K+l 


r=—l 


Proof. The probability of winning with a candidate between 1 and a is the sum of the probability 
of winning with a candidate between indices 1 and {d+ {q — 1)K) and the probability of winning 
with a candidate between indices (d + (g — 1)K + 1) and a. The probability of the former event is 
aq-i{d + (g — l)dL), so we find the probability of the latter event. The probability that the best 
applicant is at an index j is and the probability of stopping before j is subtracted as in the 


proof of Lemma 3.5 Summing the probability of winning at index j G [{d + (g — 1)K + 1), a] yields 


q-2 


(^q{a) =(^q-i{d+{q-l)K) +— Y { ^ - Y ^~ ~ 


□ 


j=d+{q-l)K+l 


r=—l 


If we let g' = f—we see that Pr(Win) = cjq/(iV). 

Analyzing the Sliding-Window Secretary Problem for large N provides intuition for the optimal 
strategy for any N. In the classical secretary problem, the Pr(Win) depends on ^ for N large. We 
now show that Pr(Win) depends on p = -^ and w = ^ for N large and for K large for the Best-I 
case by rewriting the functions / and a for large N as integrals. We constrain K >> 1,A^ >> 1, 
and N > K. 

The function / can be reduced to a new function F for large N. We let w = x = p = 


and z = S. As in the proof of Theorem 3.4, because K is large, we define Fq with integrals: 


Fq{x) = 


p+{q-l)w 




dz. 


( 6 ) 


Similarly, the function a can be reduced to a function r for large N with the same normalization, 
where now v = Then, 

Tq{x) = Tq-i{p + {q - l)w) + f ^ (p + ric) - Fg_i(u - du. (7) 

1, 1^ V r -=0 / 

p+{q-l)w 

The expressions for F and r are functions of solely w, x and p. Thus, Pr(Win) is only a function 
of w and p, because Pr(Win) = Tg'(l). As a result, the optimal normalized threshold, p* = and 
the optimal probability of winning depend on only w = ^. We now look at the asymptotics for 
large K and large N. 


3.2.1 Asymptotic Optimal Thresholds for Large N, Fixed Ratio ^ 

We use the definitions of functions F and r in Equations Q and Q respectively to find the optimal 
p* for various w. [Appendix D shows some values of p* for a normalized window size w in Table 
for large iV, and Figure shows a spline interpolation of the values in Table along with values 
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Normalized Window Size vs. Normalized Optimal Threshold Values 
for Various N: Best 1 Sliding Window Secretary Problem 



—Spline Interpolation for the 
Optimal Thresholds for Large n 

^ Normalized Optimal Threshold 
Values for n=100 

Normalized Optimal Threshold 
Values for n=20 

• Normalized Optimal Threshold 
Values for n=10 


Figure 2: How the Normalized Threshold varies with Normalized Window Size ^ 


of for N = 10, 20, and 100. Because the spline interpolation of the values in Table estimates 
the optimal thresholds for N = 100 well, the spline interpolation can predict optimal thresholds for 
large N. In Figure [s, we show values of the optimal Pr(Win) for select values of ^ and N large. 
The graph predicts the window sizes needed for various probabilities of success. 


4 The Sliding-Window Problem: The Best-2 Case 


We now study a Sliding-Window Secretary Problem similar to the Best-1 case, with a payoff of 1 
for choosing one of the top two applicants, and 0 otherwise. A candidate can be the best or second 
best out of all seen applicants, define a 1-candidate be a candidate who is the best out of all seen 
applicants. Define a 2-candidate be a candidate who is the second best out of all seen applicants. 
The interviewer loses nothing if a sliding rule is adopted in which the interviewer rejects applicants 
until the best candidate in the window is at the window’s first index. We now show that the optimal 
strategy of the Best-2 case has at most two thresholds. 


Theorem 4.1. The optimal strategy for the Best-2 case has at most two thresholds, d\ and d^, 
where the first d^ applicants are rejected, the first 1-candidate after d\ is chosen, and the first 1- 
or 2-candidate after d^ is chosen. 

Proof. The probability of winning at an index i given that a 1-candidate is at index i is the 
probability that the 1-candidate is the best or second best overall. By inclusion-exclusion. 


i -\- K — 1 
N 


+ 


i -\- K — 1 
N 


i K — 1\ f i -\- K — 2 


N 


Pr(Win with 1-Candidate at indexf) 


N -I 


















How the Optimal Probability of Winning Varies with Window Size 
for Large N in the Best 1 Sliding Window Secretary Problem 
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Figure 3: The Variation in Pr(Win) for Different Values of the Normalized Window Size w = 
for large N for the Best-1 Sliding-Window Secretary Problem 


In order for a 2-candidate to be the second best, the best must have been passed, 

'i + K — {i + K — 2' 


Pr(Win with 2-candidate at indexz) = 


N 


N -I 


Similar to the proof of Theorem 3.1, Pr(Winand Reject Candidate i) decreases in i. Because 
applicant N provides a win with probability if applicant V — V is a candidate, the probabil¬ 
ity of rejecting applicant N — K and winning is A sketch of Pr(Win | with 1-Candidate i), 
Pr(Winwith 2-candidate i), and Pr(Win | Rejecting Candidate i) is shown in Figure]^ to provide 


intuition for the remaining part of the proof. Therefore, as in Theorem 3.1 for both types of can¬ 
didates we define two integers d\ + l and +1 to be the smallest indices at which the interviewer 
should not reject a 1-candidate or 2-candidate respectively. Because Pr (Win with 1-Candidate i) > 
Pr (Win with 2-candidate i), D 


By the definitions of d^ and d^ in Theorem 4.1, the optimal strategy for the Best-2 case is to 
reject the first d^ candidates, to choose the first 1-candidate after dl with the sliding rule, and to 
choose the first 1- or 2-candidate after d^ with the sliding rule. 

There are four subcases for the indices of the best and second best applicants, ji and j 2 , 
respectively: 


(i) ji < dl and j 2 > d 2 

(ii) J 2 < dl and ji > di 
(hi) ji > dl and j 2 > di 
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Figure 4: A Pictorial Representation of Theorem 4.1 The probability of winning with candidate i 


strictly increases in i for both types of candidates while the probability of winning with candidate 
i rejected decreases in i. The intersections of the curves show the threshold values. 


(iv) ji < di and j 2 < ^ 2 - 

Location (iv) guarantees a loss, so we only need to consider (i), (ii), and (hi). 

4.1 Special Cases for K 

We first showe ^ window size oi K = N — 2. 

For window size K = N — 2, we find the optimal d'l and d^ for each of the following 4 cases: (i) 
di = 0 (because if no 1-candidates are skipped we need not consider 2-candidates); (ii) di = 1 and 
d 2 = 1; (hi) di = 1 and d 2 = 2; and (iV) di = 2 and d 2 = 2. 

For di = 0, the interviewer loses if and only if there is a 1-candidate in the hrst index, but the 
second best and best applicants are in the last two indices, i.e., R(l) = 3, R{N — 1) is either 2 or 1, 
and R(N) is either 1 or 2. This event occurs with probability • For di = 1 and d 2 = 1, 

the interviewer loses if and only if applicant 1 is either the best or second best, and applicant 2 is a 
candidate that does not provide a win, i.e., d?(l) is either 1 or 2, R{2) = 3, and R{N) is either 2 or 

1. This event occurs with probability ■ For di = 1, and d 2 = 2, the interviewer loses if 

and only if the interviewer skips the first and second place applicants, i.e., R(l) = 1 and R{2) = 2. 
This event occurs with probability Finally for di = 2 and d 2 = 2, the interviewer loses 

if and only if the interviewer skips the hrst and second place applicants, i.e., R(l) is either 1 or 

2, and R{2) is either 2 or 1. This event occurs with probability Thus, the probability of 

winning is maximized for dj^ = 0 or d^ = 1 and d^ = 1 for n > 4. 
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The results of a simulation for small values of N and K in Appendix E suggest that as K 


increases, Pr(Win) strictly increases and both optimal thresholds decrease. We formally prove this 
below. We first prove that Pr(Win) strictly increases as K increases. 

Lemma 4.2. Let and d*j ^2 denote the optimal first and second thresholds respectively for a 
sliding window of size K, and let similar notation hold for k. If k < K, then Pr(Win | k, < 

P{Wm\K,d*K^,d*K2)- 

Proof. We prove that Pr(Win | K,d*^i,d*^ 2 ) < Pr(Win | K,d*^i,d*^ 2 ) ^ Pr(Win | A, 4 ^ 2 )- 
Because and dfi -2 are optimal for K, Pr(Win | K, d*^i, ^* 2 ) ^ Pr(Win | K, d|^ 2 )- Therefore, 
we prove that Pr(Win | k, d*;^,d* 2 ) < Pr(Win | A, d*^,d* 2 )- 

Let us first consider winning with a 2-candidate after d *2 given that ji < d*]^. Since A > k, it 


follows from Lemma 3.2 that we win more frequently with a window of A than with a window of 


K. Similarly, let us consider winning with a 1-candidate, or with threshold d*]^. Then Lemma 3.2 
exactly applies, either if j 2 < ji or j 2 > ji since both will be considered as 1-candidates. Because 
each subcase occurs with the same probability for identical thresholds, Simpson’s paradox does not 
apply and therefore Pr(Win | k, d*]^,d* 2 ) < Pr(Win | A, d*]^,d* 2 )- □ 

We now prove that both optimal thresholds decrease as the window size increases. 

Lemma 4.3. If k < K, then d*j^i < df.^ and d ^2 ^ ^k 2 - 

Proof. The d 2 threshold is only relevant for the case in which ji < di and j 2 > d 2 . We can use 


the same argument as in Lemma 3.3 to conclude that d |^2 — Similarly we can use the same 
argument as in Lemma 3.3 for the di threshold to conclude that dfi-^ < d*]^. □ 


We now find the maximum value of A for which d^ is no longer 1. 


Theorem 4.4. For large N, K = is the largest window size for which d^ 


> 1 . 


Proof. We only consider 2-candidates after d^. Because for large N, the probability that a 2- 
candidate in the first y applicants is the 2nd best applicant overall is j, but the probability that 
there is a better applicant later is we consider only A > ^. Thus the interviewer sees all 
applicants if he skips a 2-candidate at index 2. 

If d 2 is 1, Pr(Win) is higher if a 2-candidate at index 2 is chosen than if the 2-candidate is 
rejected. The probability that the 2-candidate at index 2 is the second best overall is the probability 
that the first and second best applicants overall are between 1 and A-|-l. The probability of winning 
if the 2nd applicant is rejected is equal to the probability that the best or second best applicants 
are after A -|- 1. Thus, for d^ = 1, 


K + l 
N 


A 


A- 1 


> 


N-K-1 


+ ' A 


A-A-1 
' A- 1 


The largest value at which the inequality does not hold is 

V^8(A- 1)2 + 1 + 1 A-1 

-i-“+r- 


□ 
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4.2 A Recursive Formula for the Probability of Winning 


As in the Best-1 case, we now derive a general solution for the Best-2 case, assuming possibly 
non-optimal thresholds of di and 3,2 ■ We again consider blocks of size K after di and after d 2 - Let 
hs{a) be the probability of stopping at a 2-candidate between (^2 + (■s — ^)K -|- 1) and a given that 
the best applicant that the interviewer has interviewed is between 1 and di, where s = |~ ]. 

Let gs{o) be the probability of stopping at a 2-candidate between d 2 + {s — 1)K + 1 and a where 
s = [. Finally, let fq{a) be the probability that the interviewer stops at a 1-candidate between 
di + {q — ^)K -|- 1 and a, where q = . For g < 0, / is 0, and for s < 0, g and h are both 

0, because the interviewer does not select 2-candidates before d 2 and does not select 1-candidates 
before di. We now present recursive formulas for fq, gg, and hg, for g > 0 and for s > 0. For 
convenience, let 

1 ^ 1-2 

c(*) = ^ {hr{d2 + rK)) + ^(i - K), 

r=—l 

and 

1 ^ 1-2 1 ^ 1-2 

^(*) = X] ifridi+rK)) + K) + ^ {gr(d 2 + rK)) + g^i-d^,,^ 

r=—1 r=—1 

Lemma 4.5. For s > 0 and for q > 0, 

h.(a) = {irh^) ’ 

i=d 2 + {s-l)K+l ^ 2 


9sia) = 


i=d 2 + {s—l)K+l 


di 


i + K — 1J \i -\- K — 2 


(1 -c(i)). 


/<?(a) = ^ 

i=di+(q—l)K+l 


i + K — 1 


{i-m 


Proof. For the function h, the probability of stopping at a 2-candidate at an index i given that 
the best applicant out of all seen applicants is in [l,di] is because 1 of the {i + K — 1) 

indices is occupied by the best applicant so far. In addition, probabilities of stopping earlier can 


be subtracted in blocks of K as in the proof of Lemma 3.5 Therefore, 


hs{a) = 

i=d 2 + {s—l)K+l 


1 


i + K-2 


(1 - c(i )). 


For the function g, an additional term is added to guarantee that the best applicant 

out of all seen applicants is in the first di indices. When subtracting the probabilities of stopping 
before, we use the function h because the additional term already accounts for the best applicant 
so far being restricted to [l,di]. Therefore, 


9s{a) 


E 


i=d 2 + {s—l)K+l 


di 


i + K — I 


( - - - 

[i + K-2 



c(i)) . 
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Finally, for the function /, the probability of having a 1-candidate at an index i is while 

the probabilities of stopping at earlier candidates need to be subtracted. Therefore, 

/,(«)= E a 

i=di+(q-l)K+l ^ ^ 

Now, let cTi(a) return the probability of winning with a candidate between 1 and a for subcase 
(i). Let (T 2 (a) return the probability of winning with a candidate between 1 and a for subcase (ii). 
Let as{a) return the probability of winning with a candidate between 1 and a for subcase (iii). For 
a < d 2 , <Ti(a) = 0, because we do not choose 2-candidates before d 2 - For a < di, a 2 {a) = £ 13 ( 0 ) = 0 
because we do not choose candidates before di. We now present recursive formulas for the a 
functions. 

Theorem 4.6. Let q = and s = Then, 

ai{a) = ai{d 2 + {s-l)K)+ ^ ^ (iV^) ’ 

0-2(0) = f72(ci2 + (s - l)iL) + ^ ^ (iv^) ’ 

i=d 2 + {s-l)K+l ^ ^ 

0-3(0) = 03(^1(g-l)iL)^ 2 

i=di+(q-l)K+l ^ 

Proof. We find 0-1 (a) by summing ai{d 2 + {s — l)K) with the probability of winning with a candidate 
in [{d 2 + {s — l)iF),a]. The probability of stopping at j 2 in subcase (i) is because the 

best applicant has to be among the first di applicants After subtracting probabilities of stopping 
earlier, we find 

ai{a) = ai{d 2 + {s-l)K)+ ^ ^ (iV^) ' 

i=d 2 +{s-l)K+l ^ ^ 

The probability of stopping at ji in subcase (ii) similarly is W (n^)- Therefore, 

02(0) = cJ2(d2 + (s - 1 )A') + ^ ^ (iv^) 

i=d2 + is-l)K+l ^ ^ 

Finally, the probability of stopping at either ji or j 2 in subcase (iii) is because 

permuting ji and j 2 does not affect Pr(Win). Accounting for not stopping earlier yields 

a3{a) = asidi + {q - 1)K) + ^ ~ ° 

i=di+{q-l)K+l ^ ^ 

We now find that Pr(Win) = o-i(n) -|- <72 (n) -|- 03 (n). 
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For large N and large K, we normalize the functions so that H (^) ps h{a), G (^) g{a), 

F{§) ~ /(a), and ti{§) ^ ai{a) for I e {1,2,3}. We let w = §, x = pi = and p 2 = 
and write the asymptotic functions using integrals: 

X 

p 2 + {s—l)w 
w 

P2 + {s-l) 

X 

pi + ( s - l )-!« 

where 

r2^1_2 

I w ' 

C{v)= 'V] iHr(p 2 + rw)) + H.V-P 2 -. ^(v-w), 

* ^ I w ' 

r=—\ 

and 

r "-P2 -|_2 |- t»-P2 -|_2 

I I I k; I 

^('^) = y] (Fr(pi + ™)) + - rc) + V] (Gr(p2 + ™)) + - FT). 

I I ^ I ID I 

r =—1 r =—1 

Similarly, 

Tl{x) = Tl{p2 + {S - l)w) + J {pi{l - C{v)))dv, 

p2-\-{s-l)w 

X 

T 2 {x) = T 2 {P 2 + {s - l)w) + j {pi{l - C{v)))dv, 

p2-\-is-l)w 

X 

Tsix) = Tsipi + {q - l)w) + j {2{1 - v){l-T{v)))dv. 

pl + (q-l)w 

As with the Best-1 case, Pr(Win) = ri(l) +r 2 (l) -|-r 3 (l) depends only on w, pi, and p 2 for large 
N. When finding the optimal pi and p 2 , or p\ and P 2 , the two systems of equations = 0 

and ^ = 0) are solved, and thus p\, P 2 , and the optimal Pr(Win) only depend on ^ for N 
large. We now look at asymptotics for large K and large N. 

4.3 Asymptotic Optimal Thresholds: Large N, Fixed ^ 

As with the Best-1 case, we can use the recursions for large N to find the optimal normalized 
thresholds p\ and P 2 as a function of the normalized window size w. However, MATLAB was not 
able to compute the recursions in integral form. As a result, we present the optimal normalized 
thresholds for the normalized window size w, and three cases: N = 10, N = 100, N = 1000. A 
spline interpolation of the optimal normalized thresholds for N = 1000, along with the values of 
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Normalized Window Size vs. Normalized Optimal Threshold Values 
for Various N: Best 2 Sliding Window Secretary Problem 

U. / 

rS. 0-6 “ 

"0 

* * A Value of pi* for 

♦ N=10 

^ ♦ Value of p2* for 

S 0.5 - 

■ Value of pi* for 

1 ■ 
a 

^ N=100 

^ • Value of p2* for 

^ U.o ■ 

0 

N 

^ N=100 

~~Spline Interpolation 

2 U.2 ■ 
S 

0 

^ ♦ of pi* for N=1000 

••••Spline Interpolation 

(J.i 

• for p2* for N=1000 
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Figure 5: The Variation of the Normalized Thresholds p\ = '^ and P 2 = varies with the 
Normalized Window Size tt; = ^ for V = 10, 20, 100 


the optimal normalized thresholds for V = 10 and N = 100 for select normalized window sizes is 
shown in Figure 

In addition, a spline interpolation of select values of K and the probability of winning with that 
value of K is shown in Figure for N = 100 to show what values of K are required to guarantee 
certain probabilities of winning.if an applicant is a candidate. 


4.4 Extensions to Winning with One of the Top L Applicants 

Our analysis generalizes in a straightforward way to the top L case, where the interviewer wins 
if one of the Top L applicants is chosen. Because there will be L different types of candidates in 
the Top-L problem, and it follows from Theorem 4.1 that the optimal strategy has L thresholds. 
Equations similar to those in Theorem 4.6 can be used to find the probability of winning for various 
window sizes, some fixed number of applicants, and different threshold values. 


5 The Sliding-Window Problem: The 2-Choice Case 

We now examine a Sliding-Window Secretary Problem similar to the Best-1 Case, where we grant 
the interviewer the ability to choose two applicants and a win occurs if either of the two applicants 
is the best overall. Again, the same sliding rule is implemented because it costs us nothing. We 
first show that the 2-Choice Case has two thresholds and then the optiomal decision strastegy given 
the two thresholds. 
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How the Optimal Probability of Winning Varies with Window Size 
for N=100 in the Best 2 Sliding Window Secretary Problem 



Normalized Window Size 


Figure 6: The Variation in Pr(Win) for Different Values of the Normalized Window Size x = ^ for 
large N for the Best-2 Sliding-Window Secretary Problem 


Theorem 5.1. The optimal strategy has most 2 thresholds for the 2-Choice Case. Given thresholds 
and 62 , the optimal strategy is to reject the first cij' applicants, to choose the first candidate, mi, 
after 6^ with the sliding rule after and then to choose the first candidate, m 2 , after both mi and 
02 - 

Proof. We first prove that the interviewer’s second choice has an optimal threshold, 82 . By the 
sliding rule, 

% K — 1 

Pr(Win with candidate z as a second choice) = -- - -■. 

As with the proof to Theorem 3.1, the probability of rejecting candidate i and winning decreases in 


i, and is lower than the probability of choosing candidate i and winning i = N — K. Therefore, 


as with the proof to Theorem 3.1, for some optimal threshold 62 , choosing the first candidate after 


82 as a second choice maximizes the probability of winning. 

We now prove that the interviewer’s first choice has an optimal threshold, 5jj'. We consider the 
function q{i), the probability of winning if the interviewer’s first choice is candidate i, and p{i), 
the probability of choosing the best applicant as a second choice given that the best applicant is 
after candidate i. Because the interviewer wins either if candidate i is the best applicant, or if the 
interviewer’s second choice after i is the best applicant, 


i-\- K - 1 

+ 


1- Jf — I pW. 


If the best applicant is after index i, the interviewer finds the best applicant with higher probability 
if there are fewer applicants after i and thus if i is larger. Therefore p{i) increases in i. We show 
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that q{i) increases in i by computing q{i + 1) — q{i): 


q{i + 1) - q{i) = ^ ^ 


Because p{i + 1) < p{i) and p{i) < 1, q{i + 1) — q{i) > 0 and q{i) increases in i. Additionally, as 
with the proof to Theorem |3.1[ the probability of rejecting candidate i and winning decreases in i, 
and is lower than the probability of choosing candidate i and winning \i i = N — K. Therefore, as 


with the proof to Theorem 3.1, for some optimal threshold (5j|', choosing the first candidate after 
as a second choice maximizes the probability of winning. □ 


Let 5 


and ^2 be as dehned in Theorem 5.1 


Then it is optimal to reject all applicants before 
(5j|' + 1, to choose the first candidate to appear after (5j|' with the sliding rule, and to choose another 
candidate who is the first to appear after both the first candidate and <52. Note that <5^ > + iL 

because a block of K applicants cannot have two candidates by the sliding rule. The interviewer 
can win in two mutually exclusive subcases: 


(i) The first choice is the winning choice 

(ii) The second choice is the winning choice and the first choice was made before — iL + 2. 

(iii) The second choice is the winning choice and the first choice was made after 5^ — iL + 1. 


5.1 Special Cases for K 

We first find that the optimal probability of winning is 1 for a window size of IL > ^. 

Theorem 5.2. The optimal probability of winning for the interviewer is 1 if K > y- 

Proof. Because there cannot be two candidates in the same block of K applicants by the sliding 
rule, and because 2K > N, there are only at most two candidates that the interviewer can consider. 
Because the interviewer has two choices, the interviewer can choose both candidates. Because the 
best applicant overall is guaranteed to be a candidate, the interviewer is guaranteed to win. □ 

For the interviewer to choose all possible candidates for a window size of AT > y, <5i = 0 and 
<52 = K. Therefore, the optimal thresholds Sf and 5^ are 0 and K respectively. 

We now prove that as K increases, Pr(Win) strictly increases for fixed N. 


Lemma 5.3. Let and (5|^2 the optimal first and second thresholds respectively for a window 
size of K, and let the same notation hold for a window .size of k. Then if n > K, Pr(Win | 


<Pr(Win| iL,5:i,<5;i2) <Pr(Win 

K,6:„6:,)<Pi{Wm 


K, 6 


Kl^^K2) 


^1 ^Ki ’ ^K 2 ) • Since 
and we now prove 


Proof. We prove that Pr(Win | k,< 5*;^,<5*2 
(5|^^, and <5|^2 3-'^® optimal for K, Pr(Win 
that Pr(Win | k,5*i, 5*2) < Pr(Win | AT, 5*^, < 5 * 2 )- 

By the same argument as in Lemma 3.2 every sequence of applicants that produces a win with 
thresholds <5*^^, and 5*2, and a window size of n must also produce a win for identical thresholds 
and a window size of AT. As in Lemma 3.2 we can construct a sequence of applicants such that 


the interviewer loses with thresholds <5*;^, and 5*2 and a window size of k, but wins with identical 
thresholds and a window size of AT. Therefore, Pr(Win | k, 5*]^,5*2) < Pr(Win | A', 5*]^, 5 * 2 ). □ 
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We now prove that as K increases, the first optimal threshold decreases for fixed N. The 
second optimal threshold must be at least K greater than the first optimal threshold so the second 
threshold may not necessarily decrease as K increases. 


Lemma 5.4. If k > K, then <5*^ > 


Proof. From the proof of Theorem 5.1, for a candidate i, Pr(Win | i is first choice, K) > Pr(Win 
i is first choice, k). Using a similar argument as in Lemma 3.1 
Rejecting i, K) > Pr(Win | Rejecting i, k) for the first choice. 

Lemma 13.31 > 5 


'Kl- 


for a candidate i, Pr(Win | 
By the same argument as in 

□ 


5.2 A Recursive Formula for the Probability of Winning 


We now analyze the problem for some window size K and some threshold values and 62 . We 
again will use blocks of size K because of the sliding rule. We modify the function / in Lemma 


3.5 as follows: we let f{x, a) be the probability of choosing an applicant between indices x + 
A ([— 1 ) +1 and a. For a < x, f{x, a) = 0. For a > x, using the same argument as in Lemma 
we find that 


3.5 


f(x,a) = 


E 


1 


1 ^ 1-2 


m=x+a2^-l-l)K+l 


m + K — 


- 1- ^ f{x,xPrK)-f{x,m-K)Y 


r =—1 


We now let g{m, x, b) be the probability of making a choice at an index m, not making another 
choice until index b — k + 1 , given that the interviewer is guaranteed not to make a choice before 
index x + 1, and the interviewer chooses the applicant at index b. Let c{m, x) be the probability of 
making a choice at index m given that the interviewer is guaranteed not to make a choice before 
index x + 1. For m < x, c(m, x) = 0, and for m > x, 


f{x, X + rK) — /(x, m — K) j . 

By the sliding rule, because the first and second choices cannot be in the same block of K applicants, 
if b—m < K, g{m, x, b) = 0. Otherwise, because there are no candidates between m+1 and m+K—1 
by the sliding rule if m is a candidate, and a random ordering of applicants guarantees that finding 
a candidate starting at index m + iF is independent of finding a candidate at index m, we can 
multiply c(m,x) by the probability that we find no other candidate after index m + iF — 1 , given 
that applicant m is a candidate that we have chosen. Therefore, 


c(m, x) = 


1 


m + iF — 1 



g{m,x,b) 


c{m, x) 



r b — m — K + l 
I K 


E 

r=-2 


f{m + K 


1, m + rK — 1) — f{m + K 


1,6-iF) 


We now divide the probability of winning as follows: cri(a) is the probability of winning between 
[ 1 , a] with subcase (i), cJ 2 (a) is the probability of winning between [ 1 , a] with subcase (ii), and 0 - 3 ( 0 ) 
is the probability of winning between [1, o] with subcase (iii). For a < (^ 1 , fTi(a) = 0, and for a < 62 , 
0 - 2 ( 0 ) = 0 - 3 ( 0 ) = 0. We compute pb, the probability of making a choice between (5i + l and 62 — K + l 
as 

Pb= ^ f{ 61 , 61 +rK) + f{ 61,62 - K+ 1). 

r=0 
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Theorem 5.5. If a > 6 i and q = then 

ai{a) = ai{ 6 i + {q - 1)K) + ^ ^ M 

j=5i+{q-l)K+l V 

If a > 62 and q = then 

cr2{a)=cr2{52 + {q-l)K) + ^ ^ M 

j=h+{q-i)K+l \ 


Y, f{5i,5i+rK)-fi6uj-K) 


Y fi^2,d2 + rK) - f{ 62 ,j - K) 


If a > 62 then 


^ a —1 

0 - 3 ( 0 ) = 0 - 3(0 - 1 ) + — Y aim, 61 , a). 


171=62 —K -\-2 


Proof. The expression for fii follo-ws directly from Theorem 3.6 Similarly, the expression for 02 


follo-ws directly from Theorem |3.6t except -with the added condition that a choice is made bet-ween 
+ 1 and 82 — K + 1. The probability of making a first choice at an index m and making a second 
choice at index j is given by g{m,5i,j — K + 1), because the intervie-wer is guaranteed to choose 
the best applicant at index j by the sliding rule if the intervie-wer does not make a choice between 
m + 1 and j — K. Therefore, by the total probability theorem, we can add up the probabilities for 
all possible values of m and find that 


^ a —1 

0 - 3 ( 0 ) = 0 - 3(0 - 1 ) + — Y, g{m, 5 i,a-K+ 1). □ 

m=i2 — K+2 

It follows that 0-1 (A^) + 0-2 (A^) + 0-3 (A^) = Pr(Win). In order to normalize 0 - 3 , we write 0 - 3 ( 0 ) as 

zih 

K 

i-1 


follows, where q = 


0 - 3 ( 0 ) = 0 - 3(52 + (g - 1 )A:) + ^ Y I Y 9 im, 6 i,j - K + 1)\ . 

j=S2+{q-l)K+l \rn=S2-K+2 J 

We now look at large K and large N. We first normalize / as the function F as done in earlier 
cases, by dividing all indices by N and approximating / with integrals. We similarly normalize c 
as C and g as G. Let § = w, % = pi, ^ = P 2 , § = a, ^ = /3, § = -f, = rj, and ^ = P- Then, 


[5^1-2 

I tl) I 


F(7,a) = 


7+(r^l-i)^ 


—7— F{'y,'y+ rw) - F{-f,p-w)] dp, 

p + w ' I 


r=—l 


[^ 1-2 
I 11! I 


CiP,l) = —^— 1- Y Fh,l + rw) - F{'y,p-w) 

p + w ' 


GiP,l,P) = Cip,-i) 


r=—l 

I- Y f{p + w,p + rw) - f{p + w,l3 -w) 

^ r=-2 
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We now normalize ai as where i is 1, 2, or 3. We additionally normalize p}j as 

r P 2 -Pl-^ -| ^ 

' w ' 

Pb= y] F{pi,pi+rw) + F{pi,p2 - w). 

r=0 


Then we find that if gi = and 02 = 

'1-L W ^ W 

a 

n{a) = Ti{pi + {qi - l)w) + j 

pi + {qi-l)w 


( 1 - F{pi,pi + rw) - F{pi,p - w)\ dp, 
r 2 (a) = T 2 (p 2 + (g 2 - 1)'W^) + a / I 1 - y~] F{P 2 ,P 2 + rw) - F{p 2 ,p -w)\dp 

j G{p,pi,p-w)\ dp. 


p2+{q2-i)w 


Tsia) = Tsipi + (g2 - + 


p2 + {q2 — i)'W \>2 — ‘W 


Normalized Optimal Thresholds vs. Normalized Window Size for 
Various N: 2-Choice Case 


0.5 


C/2 



SI 


s 

o 0.1 


0 



““Spline Interpolation of First Optimal 
Threshold for N=100 Applicants 

••••Spline Interpolation of Second Optimal 
Threshold for N=100 Applicants 

^ First Optimal Threshold for N=20 
Applicants 

• Second Optimal Threshold for N=20 
Applicants 

♦ First Optimal Threshold for N=10 
Applicats 

■ Second Optimal Threshold for N=10 
Applicants 


Normalized Window Size w 


s* s* 

Figure 7: The Variation of the Normalized Thresholds Pi = ^ and P 2 ~ w with the Normalized 
Window Size w = ^ for N = 10, 20, 100 

Therefore Pr(Win) ss ti( 1) + r 2 (l) +r 3 (l), and as a result, Pr(Win) only depends on w, pi, and 
P 2 for large N. As with previous cases, the optimal Pr(Win), the normalized first optimal threshold 
pI, and the normalized second optimal threshold P 2 only depend on the normalized window size 
w. Figureshows how the normalized thresholds depend on the normalized window size w = ^. 
Similarly, Figure shows how the optimal probability of winning depends on the normalized window 
size w = 
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How the Optimal Probability of Winning Varies with Normalized 
Window Size in the 2-Choice Sliding Window Secretary Problem 
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Figure 8: The Variation of the 2-Choice Optimal Probability of Winning with the Normalized 
Window Size w = ^. For K > y, Pr(Win) = 1, and thus the graph only displays probabilities of 
winning for N < y. 


6 Conclusions and Directions for Future Research 

We studied the Sliding-Window Secretary Problem for 3 different cases for a fixed number of 
applicants: (i) choosing the best, (ii) choosing either the best or second best, and (hi) two choices 
to choose the best. For each case, we found the maximum probability of winning for any window 
size, computed the optimal thresholds, and performed asymptotic analysis. 

Our results naturally extend to the Top-L case, where the interviewer wins if one of the top L 
is chosen. For future research directions, the Sliding-Window can also apply to the other classical 
Secretary problems. For example, finding the best expected rank, in which Chow et al. [3] have 
already found the best expected rank for a sliding window of size 1, while Goldys [9] has found the 
best expected rank for a sliding window of size 2. Another extension is the full-information problem 
where the interviewer knows a cardinal score of each applicant and the probability distribution of 
scores, instead of just relative ranks. This variation is more applicable to realistic situations, 
because decisions are made not solely based on the ordinal ranks of options but the magnitude of 
the benefit of each option. 
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Appendix A The Classical Secretary Problem 


We present Gilbert and Mosteller’s proofs of the optimal strategy for the secretary problem. 


Theorem A1 (Gilbert and Hosteller (1966) [7j). The solution can be restricted to the strategy in 
which for some integer d* > 1, the interviewer rejects the first d* applicants and chooses the next 
applicant who is better than the first d* applicants &■ 


We present the proof of Theorem AT to demonstrate the methods used to prove the optimal 
form of the strategy for the Sliding Window Secretary Problems. 


Proof. We shall dehne a candidate as an applicant such that if the interviewer chooses to accept 
this applicant, the probability of winning is strictly nonzero. 

Suppose we reach a candidate with index i. Because candidate i is better than all previous 
applicants, the probability that candidate i is the best applicant is 

Now consider the probability of winning with the optimal strategy given that the interviewer 
rejects candidate i. The probability of winning given that candidate i is rejected decreases as i 
increases, because the larger i is, the more likely that the best applicant is at i. 

The interviewer only chooses applicant i if i is a candidate and if the following inequality holds. 


Pr(Win I Ghoosing Candidate i) > P(Win | Skipping Candidate i) (8) 

Because there is a ^ chance that applicant N is better than applicant A—1 if applicant iV—1 is a 
candidate, Pr(Win | Skipping Candidate N—1) = Therefore, Pr(Win | Choosing Candidate N— 
1) > Pr(Win I Skipping Candidate N — \). Because the probability of winning given that candi¬ 
date i is chosen strictly increases in i, the probability of winning given that candidate i is rejected 
decreases in i, there is a greatest integer d* G [0, A — 1] such that Inequality only holds after 
d*. □ 


We now find the optimal threshold d* for large N, as seen in Gilbert and Hosteller (1966) 
Theorem A2 (Gilbert and Hosteller (1966) [7]). For N large, d* k, 

Proof. Let d be an arbitrary threshold value, and d* be the value of d that maximizes the probability 
of winning. Let Pr(Win | j) be the probability of finding the best-ranked applicant if j is the index 
of the best-ranked applicant. Then Pr(Win | j) = 0 for j < d, and because the probability of 
finding the best-ranked applicant after d is equal to the probability of not finding an applicant 
better than the first d applicants in the first j — 1 applicants, Pr(Win \ j) = Aj for j > d. 

Each value of j occurs with probability and thus by the total probability theorem. 


N 

Pr(Win) = ^ 
j=i 


Pr(Win I j) 
N 



Pr(Win) can be converted to an integral for N large: 

d I 

Pr(Win) ~ / -dt. 

^ ' N Jd t 

So Pr(Win) « —^log(^). The value of d that maximizes Pr(Win) is So d* = and 
Pr(Win) = \^. □ 
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Appendix B Optimal Threshold Values for Various Numbers of 
Applicants and Window Sizes: Choosing the Best 
Applicant 


Number of 
Applicants 

Window 

Size 

Optimal 

Threshold 

Probability 
of Win 

6 

2 

1 

0.5611 

6 

3 

0 or 1 

0.7167 

7 

2 

2 

0.5321 

7 

3 

1 

0.6690 

7 

4 

0 

0.8114 

8 

2 

2 

0.5089 

8 

3 

1 

0.6199 

8 

4 

0 or 1 

0.7405 

8 

5 

0 

0.8655 

9 

2 

3 

0.4880 

9 

3 

2 

0.5741 

9 

4 

1 

0.6988 

9 

5 

0 

0.8099 

9 

6 

0 

0.8988 

10 

2 

3 

0.4774 

10 

3 

2 

0.5634 

10 

4 

1 

0.6566 

10 

5 

0 or 1 

0.7544 

10 

6 

0 

0.8544 

10 

7 

0 

0.9210 


Table 1: Optimal Values for The Threshold Value d* Given the Number of Applicants N and 
Window Size K, along with Respective Probabilities of Success Pr(Win) 
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Appendix C Probabilities of Success Given a Threshold d for 100 
Applicants and a Window Size of 2: Choosing the 
Best Applicant 


Value of 
Threshold d 

Probability 
of Win 

33 

0.3760 

34 

0.3768 

35 

0.3773 

36 

0.3775 

37 

0.3774 


Table 2: Probabilities of Success For Certain Thresholds Given V = 100 and K = 2. The optimal 
d* for K = 2 is 36, which is the optimal d* for the original secretary problem. 
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Appendix D 


Values of Optimal Thresholds (d*) for Different Win¬ 
dow Sizes {k) and Large Number of Applicants (n) 


Normalized 
Window Size 

Normalized Value 
of Optimal Threshold 

0 

0.3679 

0.2 

0.2635 

0.22 

0.2494 

0.24 

0.2347 

0.26 

0.2193 

0.28 

0.2033 

0.3 

0.1867 

0.32 

0.1696 

0.34 

0.1520 

0.36 

0.1341 

0.38 

0.1158 

0.4 

0.09716 

0.42 

0.07823 

0.44 

0.05903 

0.46 

0.03958 

0.48 

0.0199 

0.5 

0 


Table 3: 


Normalized Threshold Values ^ for Select Values of the Normalized Window Size ^ 
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Appendix E Optimal Threshold Values for Various Numbers of 
Applicants and Window Sizes: Choosing One of the 
Best Two Applicants 


Number of 
Applicants 

Window 

Size 

Optimal 
Threshold 1 

Optimal 
Threshold 2 

Probability 
of Winning 

4 

2 

1 

1 

0.9167 

5 

2 

1 

2 

0.8833 

5 

3 

1 

1 

0.9667 

6 

2 

1 

3 

0.8333 

6 

3 

1 

2 

0.9333 

6 

4 

1 

1 

0.9833 

7 

2 

2 

4 

0.7929 

7 

3 

1 

3 

0.8976 

7 

4 

1 

2 

0.9571 

7 

5 

1 

1 

0.9905 

8 

2 

2 

4 

0.7696 

8 

3 

1 

3 or 4 

0.8595 

8 

4 

1 

3 

0.9310 

8 

5 

1 

2 

0.9702 

8 

6 

1 

1 

0.9940 

9 

2 

2 

5 

0.7454 

9 

3 

2 

4 

0.8364 

9 

4 

1 

3 

0.9052 

9 

5 

1 

2 

0.9517 

9 

6 

1 

1 

0.9788 

9 

7 

1 

1 

0.9960 


Table 4; Optimal Values for The Thresholds Given the Number of Applicants and Window Size, 
along with Respective Probabilities of Success 
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